Attenuated Signal Test Example

This notebook demonstrates an example of how an attenuation test can be used to detect biofouling on an instrument.

The source data is derived from a historical USGS CTD station located at Lower Sand Island, OR in the Columbia River estuary. The selected time period shows the tidal influence on salinity over a spring-neap time period. Near the end of the selected period, there is a decrease in the range of salinity corresponding with biofouling.

The data was downloaded from the Center for Coastal Margin and Prediction (CMOP) Data Explorer.

Setup

[1]:
# Setup directories
from pathlib import Path
basedir = Path().absolute()
libdir = basedir.parent.parent.parent

# Other imports
import pandas as pd
import numpy as np
from datetime import datetime

import matplotlib.pyplot as plt

from bokeh.layouts import gridplot
from bokeh import plotting
plotting.output_notebook()
Loading BokehJS ...
[2]:
# # Install QC library
# !pip install git+git://github.com/ioos/ioos_qc.git

# Alternative installation (install specific branch):
!pip uninstall -y ioos_qc
!pip install git+git://github.com/ioos/ioos_qc.git@attenuated_signal_updates

# # Alternative installation (run with local updates):
# !pip uninstall -y ioos_qc
# import sys
# sys.path.append(str(libdir))

from ioos_qc.config import QcConfig
from ioos_qc import qartod
Found existing installation: ioos-qc 0.2.1
Uninstalling ioos-qc-0.2.1:
  Successfully uninstalled ioos-qc-0.2.1
Collecting git+git://github.com/ioos/ioos_qc.git@attenuated_signal_updates
  Cloning git://github.com/ioos/ioos_qc.git (to revision attenuated_signal_updates) to /tmp/pip-req-build-m8b9x358
  Running command git clone -q git://github.com/ioos/ioos_qc.git /tmp/pip-req-build-m8b9x358
  Running command git checkout -b attenuated_signal_updates --track origin/attenuated_signal_updates
  Switched to a new branch 'attenuated_signal_updates'
  Branch 'attenuated_signal_updates' set up to track remote branch 'attenuated_signal_updates' from 'origin'.
Requirement already satisfied: geojson in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from ioos-qc==0.2.1) (2.5.0)
Requirement already satisfied: netCDF4 in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from ioos-qc==0.2.1) (1.5.3)
Requirement already satisfied: numpy>=1.14 in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from ioos-qc==0.2.1) (1.18.1)
Requirement already satisfied: pygc in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from ioos-qc==0.2.1) (1.2.1)
Requirement already satisfied: ruamel.yaml in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from ioos-qc==0.2.1) (0.16.6)
Requirement already satisfied: simplejson in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from ioos-qc==0.2.1) (3.17.0)
Requirement already satisfied: xarray in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from ioos-qc==0.2.1) (0.15.0)
Requirement already satisfied: cftime in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from netCDF4->ioos-qc==0.2.1) (1.0.4.2)
Requirement already satisfied: ruamel.yaml.clib>=0.1.2; platform_python_implementation == "CPython" and python_version < "3.8" in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from ruamel.yaml->ioos-qc==0.2.1) (0.2.0)
Requirement already satisfied: pandas>=0.25 in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from xarray->ioos-qc==0.2.1) (1.0.1)
Requirement already satisfied: pytz>=2017.2 in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from pandas>=0.25->xarray->ioos-qc==0.2.1) (2019.3)
Requirement already satisfied: python-dateutil>=2.6.1 in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from pandas>=0.25->xarray->ioos-qc==0.2.1) (2.8.1)
Requirement already satisfied: six>=1.5 in /home/travis/miniconda/envs/test-environment/lib/python3.6/site-packages (from python-dateutil>=2.6.1->pandas>=0.25->xarray->ioos-qc==0.2.1) (1.14.0)
Building wheels for collected packages: ioos-qc
  Building wheel for ioos-qc (setup.py) ... - done
  Created wheel for ioos-qc: filename=ioos_qc-0.2.1-py3-none-any.whl size=31473 sha256=fc1e4cd42a9ae6fe82880fba025eb6e3b6b698e6082148a41c6701b6cfd5612a
  Stored in directory: /tmp/pip-ephem-wheel-cache-edrn7vl9/wheels/13/c5/25/415ac2cb261fab293c3847319b5f2f6fd3775ba3eb454500f5
Successfully built ioos-qc
Installing collected packages: ioos-qc
Successfully installed ioos-qc-0.2.1
[3]:
# Method to plot QC results using Bokeh
def plot_results(data, var_name, results, title, test_name):

    time = data.index
    obs = data[var_name]
    qc_test = results['qartod'][test_name]

    qc_pass = np.ma.masked_where(qc_test != 1, obs)
    qc_suspect = np.ma.masked_where(qc_test != 3, obs)
    qc_fail = np.ma.masked_where(qc_test != 4, obs)
    qc_notrun = np.ma.masked_where(qc_test != 2, obs)

    p1 = plotting.figure(x_axis_type="datetime", title=test_name + ' : ' + title)
    p1.grid.grid_line_alpha=0.3
    p1.xaxis.axis_label = 'Time'
    p1.yaxis.axis_label = 'Observation Value'

    p1.line(time, obs,  legend_label='obs', color='#A6CEE3')
    p1.circle(time, qc_notrun, size=2, legend_label='qc not run', color='gray', alpha=0.2)
    p1.circle(time, qc_pass, size=4, legend_label='qc pass', color='green', alpha=0.5)
    p1.circle(time, qc_suspect, size=4, legend_label='qc suspect', color='orange', alpha=0.7)
    p1.circle(time, qc_fail, size=6, legend_label='qc fail', color='red', alpha=1.0)
    p1.circle(time, qc_notrun, size=6, legend_label='qc not eval', color='gray', alpha=1.0)

    plotting.show(gridplot([[p1]], plot_width=800, plot_height=400))

## Run example

1. Load data and perform exploratory analysis

[4]:
# Historical salinity data from Lower Sand Island, Columbia River Estuary
# data: http://amb6400b.stccmop.org/ws/product/offeringplot_ctime.py?handlegaps=true&series=time,sandi.790.A.CTD.salt.PD0&width=8.54&height=2.92&starttime=2001-7-1 0:00&endtime=2001-09-5 23:59
# location same as sandi for CREOFS: https://tidesandcurrents.noaa.gov/ofs/ofs_station.shtml?stname=Lower%20Sand%20Island%20Light&ofs=cre&stnid=sandi&subdomain=ba
filename = basedir.joinpath('attenuated_salinity_example.csv')

data = pd.read_csv(filename, header=1, index_col='\'time PST\'', parse_dates=True)
data.plot()
[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fc6f3722710>
../_images/examples_QartodTestExample_SalinityAttenuation_7_1.png
[5]:
var_name = data.columns[0]
data[var_name]
[5]:
'time PST'
2001-08-15 19:06:55    27.37
2001-08-15 19:07:55    27.03
2001-08-15 19:08:55    26.78
2001-08-15 19:09:55    26.73
2001-08-15 19:10:55    27.10
                       ...
2001-09-01 20:51:57    15.71
2001-09-01 20:52:57    15.48
2001-09-01 20:53:57    15.46
2001-09-01 20:54:57    15.56
2001-09-01 20:55:57    14.61
Name:  sandi.790.A.CTD [psu], Length: 24436, dtype: float64

2. Plot range and standard deviation of salinity over M2 moving window

[6]:
# lunar day (M2)
time_delta = 3600 * 24.8
print(f'window: {time_delta}')

# start QC after a tidal data
# - 24.8 hours of data at 2 minute intervals
min_periods = 24.8*60/2
# panads uses phrase "min_periods" to indicate minimum number of observations
# - ioos_qc uses the phrase "min_obs"
print(f'min_obs: {min_periods}')
window: 89280.0
min_obs: 744.0
[7]:
# range check
range_data = data.rolling(f'{time_delta}S', min_periods=int(min_periods)).apply(np.ptp, raw=True)
range_data.plot()
[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fc6f358c518>
../_images/examples_QartodTestExample_SalinityAttenuation_11_1.png
[8]:
# check that min_{periods, obs} are NaN
# - note that N-1 are NaNs
range_data.isna().sum()
[8]:
 sandi.790.A.CTD [psu]    743
dtype: int64
[9]:
# stdev check
stdev_data = data.rolling(f'{time_delta}S', min_periods=int(min_periods)).apply(np.std, raw=True)
stdev_data.plot()
[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fc6f33fe240>
../_images/examples_QartodTestExample_SalinityAttenuation_13_1.png

3. Run QC and plot results

Test range

The beginning 743 points (min_obs) are marked as “NOT EVALUATED” because there is not enough data yet to evaluate whether they are pass or fail.

The range of the signal falls so quickly that no points are marked as “SUSPECT”, but immediately change from “PASS” to “FAIL”.

[10]:
# QC configuration
# This configuration is used to call the corresponding method in the ioos_qc library
# See documentation for description of each test and its inputs:
#   https://ioos.github.io/ioos_qc/api/ioos_qc.html#module-ioos_qc.qartod
qc_config = {
    'qartod': {
        "attenuated_signal_test": {
            "suspect_threshold": 17.25,
            "fail_threshold": 15,
            "test_period": int(time_delta),
            "min_obs": int(min_periods),
            "check_type": "range"
      }
    }
}
qc = QcConfig(qc_config)
qc_results = qc.run(
    inp=data[var_name],
    tinp=data.index.values
)

title = "Salinity [psu] : Lower Sand Island, OR"

p1 = plot_results(data, var_name, qc_results, title, 'attenuated_signal_test')
Test standard deviation

The “standard deviation” test picks up likely suspect data whereas the “range” test did not. The exemplifies the utility of using both tests in tandem.

[11]:
qc_config = {
    'qartod': {
        "attenuated_signal_test": {
            "suspect_threshold": 5,
            "fail_threshold": 4.5,
            "check_type": "std",
            "test_period": int(time_delta),
            "min_obs": int(min_periods),
      }
    }
}
qc = QcConfig(qc_config)
qc_results = qc.run(
    inp=data[var_name],
    tinp=data.index.values
)

title = "Salinity [psu] : Lower Sand Island, OR"
plot_results(data, var_name, qc_results, title, 'attenuated_signal_test')

## Sensitivity Tests

1. Test results sensitivity to min_obs

The following plots demonstrate the sensitivity of the test results in the beginning of a time series to the selection of min_obs.

min_obs: 0

No observations are marked suspect at the beginning of the time series, but the first 744 observations (the size of the rolling window) are labeled “FAILING”.

[12]:
qc_config = {
    'qartod': {
        "attenuated_signal_test": {
            "suspect_threshold": 5,
            "fail_threshold": 4.5,
            "check_type": "std",
            "test_period": int(time_delta),
            "min_obs": 0
      }
    }
}
qc = QcConfig(qc_config)
qc_results = qc.run(
    inp=data[var_name],
    tinp=data.index.values
)

title = "Salinity [psu] : Lower Sand Island, OR"
plot_results(data, var_name, qc_results, title, 'attenuated_signal_test')

min_obs: 10

Only the first 10 observations are marked as NOT EVALUATED, but the remainder of the first 744 samples are labeled as “FAILING”.

[13]:
qc_config = {
    'qartod': {
        "attenuated_signal_test": {
            "suspect_threshold": 5,
            "fail_threshold": 4.5,
            "check_type": "std",
            "test_period": int(time_delta),
            "min_obs": 10
      }
    }
}
qc = QcConfig(qc_config)
qc_results = qc.run(
    inp=data[var_name],
    tinp=data.index.values
)

title = "Salinity [psu] : Lower Sand Island, OR"
plot_results(data, var_name, qc_results, title, 'attenuated_signal_test')

min_obs: 100

[14]:
qc_config = {
    'qartod': {
        "attenuated_signal_test": {
            "suspect_threshold": 5,
            "fail_threshold": 4.5,
            "check_type": "std",
            "test_period": int(time_delta),
            "min_obs": 100
      }
    }
}
qc = QcConfig(qc_config)
qc_results = qc.run(
    inp=data[var_name],
    tinp=data.index.values
)

title = "Salinity [psu] : Lower Sand Island, OR"
plot_results(data, var_name, qc_results, title, 'attenuated_signal_test')

min_obs: 744

The first 744 samples are marked “NOT EVALUATED”, but none are marked as “FAILING”.

[15]:
qc_config = {
    'qartod': {
        "attenuated_signal_test": {
            "suspect_threshold": 5,
            "fail_threshold": 4.5,
            "check_type": "std",
            "test_period": int(time_delta),
            "min_obs": 744
      }
    }
}
qc = QcConfig(qc_config)
qc_results = qc.run(
    inp=data[var_name],
    tinp=data.index.values
)

title = "Salinity [psu] : Lower Sand Island, OR"
plot_results(data, var_name, qc_results, title, 'attenuated_signal_test')

2. Test results sensitivity to suspect_threshold.

suspect_threshold: 7

[16]:
qc_config = {
    'qartod': {
        "attenuated_signal_test": {
            "suspect_threshold": 7,
            "fail_threshold": 4.5,
            "check_type": "std",
            "test_period": int(time_delta),
            "min_obs": 744
      }
    }
}
qc = QcConfig(qc_config)
qc_results = qc.run(
    inp=data[var_name],
    tinp=data.index.values
)

title = "Salinity [psu] : Lower Sand Island, OR"
plot_results(data, var_name, qc_results, title, 'attenuated_signal_test')

suspect_threshold: 6

[17]:
qc_config = {
    'qartod': {
        "attenuated_signal_test": {
            "suspect_threshold": 6,
            "fail_threshold": 4.5,
            "check_type": "std",
            "test_period": int(time_delta),
            "min_obs": 744
      }
    }
}
qc = QcConfig(qc_config)
qc_results = qc.run(
    inp=data[var_name],
    tinp=data.index.values
)

title = "Salinity [psu] : Lower Sand Island, OR"
plot_results(data, var_name, qc_results, title, 'attenuated_signal_test')

suspect_threshold: 5

[18]:
qc_config = {
    'qartod': {
        "attenuated_signal_test": {
            "suspect_threshold": 5,
            "fail_threshold": 4.5,
            "check_type": "std",
            "test_period": int(time_delta),
            "min_obs": 744
      }
    }
}
qc = QcConfig(qc_config)
qc_results = qc.run(
    inp=data[var_name],
    tinp=data.index.values
)

title = "Salinity [psu] : Lower Sand Island, OR"
plot_results(data, var_name, qc_results, title, 'attenuated_signal_test')